{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Azure AI Search CSV integrated vectorization sample\n",
    "\n",
    "This Python notebook demonstrates the [integrated vectorization](https://learn.microsoft.com/azure/search/vector-search-integrated-vectorization) and [CSV indexing](https://learn.microsoft.com/en-us/azure/search/search-howto-index-csv-blobs) features of Azure AI Search that are currently in public preview. \n",
    "\n",
    "Integrated vectorization takes a dependency on indexers and skillsets and the AzureOpenAIEmbedding skill and your Azure OpenAI resorce for embedding.\n",
    "\n",
    "This example uses a CSV from the `csv_data` folder for chunking, embedding, indexing, and queries.\n",
    "\n",
    "### Prerequisites\n",
    "\n",
    "+ An Azure subscription, with [access to Azure OpenAI](https://aka.ms/oai/access).\n",
    " \n",
    "+ Azure AI Search, any tier, but we recommend Basic or higher for this workload. [Enable semantic ranker](https://learn.microsoft.com/azure/search/semantic-how-to-enable-disable) if you want to run a hybrid query with semantic ranking.\n",
    "\n",
    "+ A deployment of the `text-embedding-3-large` model on Azure OpenAI.\n",
    "\n",
    "+ A deployment of the `gpt-4o` or `gpt-4o-mini` model on Azure OpenAI. \n",
    "\n",
    "+ Azure Blob Storage. This notebook connects to your storage account and loads a container with the sample CSV.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up a Python virtual environment in Visual Studio Code\n",
    "\n",
    "1. Open the Command Palette (Ctrl+Shift+P).\n",
    "1. Search for **Python: Create Environment**.\n",
    "1. Select **Venv**.\n",
    "1. Select a Python interpreter. Choose 3.10 or later.\n",
    "\n",
    "It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install -r csv-indexer-requirements.txt --quiet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load .env file (Copy .env-sample to .env and update accordingly)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dotenv import load_dotenv\n",
    "from azure.identity import DefaultAzureCredential\n",
    "from azure.core.credentials import AzureKeyCredential\n",
    "import os\n",
    "\n",
    "load_dotenv(override=True) # take environment variables from .env.\n",
    "\n",
    "# Variables not used here do not need to be updated in your .env file\n",
    "endpoint = os.environ[\"AZURE_SEARCH_SERVICE_ENDPOINT\"]\n",
    "# You do not need a key if you are using keyless authentication\n",
    "# To learn more, please visit https://learn.microsoft.com/azure/search/search-security-rbac\n",
    "credential = AzureKeyCredential(os.getenv(\"AZURE_SEARCH_ADMIN_KEY\")) if os.getenv(\"AZURE_SEARCH_ADMIN_KEY\") else DefaultAzureCredential()\n",
    "index_name = os.getenv(\"AZURE_SEARCH_INDEX\", \"csv-vec\")\n",
    "blob_connection_string = os.environ[\"BLOB_CONNECTION_STRING\"]\n",
    "# search blob datasource connection string is optional - defaults to blob connection string\n",
    "# This field is only necessary if you are using MI to connect to the data source\n",
    "# https://learn.microsoft.com/azure/search/search-howto-indexing-azure-blob-storage#supported-credentials-and-connection-strings\n",
    "search_blob_connection_string = os.getenv(\"SEARCH_BLOB_DATASOURCE_CONNECTION_STRING\", blob_connection_string)\n",
    "blob_container_name = os.getenv(\"BLOB_CONTAINER_NAME\", \"csv-vec\")\n",
    "azure_openai_endpoint = os.environ[\"AZURE_OPENAI_ENDPOINT\"]\n",
    "# You do not need a key if you are using keyless authentication\n",
    "# To learn more, please visit https://learn.microsoft.com/azure/search/search-howto-managed-identities-data-sources and https://learn.microsoft.com/azure/developer/ai/keyless-connections\n",
    "azure_openai_key = os.getenv(\"AZURE_OPENAI_KEY\")\n",
    "azure_openai_embedding_deployment = os.getenv(\"AZURE_OPENAI_EMBEDDING_DEPLOYMENT\", \"text-embedding-3-large\")\n",
    "azure_openai_model_name = os.getenv(\"AZURE_OPENAI_EMBEDDING_MODEL_NAME\", \"text-embedding-3-large\")\n",
    "azure_openai_model_dimensions = int(os.getenv(\"AZURE_OPENAI_EMBEDDING_DIMENSIONS\", 1024))\n",
    "# NOTE: The chat deployment should support JSON Schema\n",
    "# To learn more, please see\n",
    "# https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/structured-outputs#supported-models\n",
    "azure_openai_chat_deployment = os.getenv(\"AZURE_OPENAI_CHATGPT_DEPLOYMENT\", \"gpt-4o\")\n",
    "azure_openai_api_version = os.getenv(\"AZURE_OPENAI_API_VERSION\", \"2024-10-21\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Connect to Blob Storage and load documents\n",
    "\n",
    "Retrieve documents from Blob Storage. You can use the sample documents in the data/documents folder.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Setup sample data in demo-container\n"
     ]
    }
   ],
   "source": [
    "from azure.storage.blob import BlobServiceClient  \n",
    "import glob\n",
    "\n",
    "def upload_sample_documents(\n",
    "        blob_connection_string: str,\n",
    "        blob_container_name: str,\n",
    "        use_user_identity: bool = True\n",
    "    ):\n",
    "    # Connect to Blob Storage\n",
    "    blob_service_client = BlobServiceClient.from_connection_string(conn_str=blob_connection_string, credential=DefaultAzureCredential() if use_user_identity else None)\n",
    "    container_client = blob_service_client.get_container_client(blob_container_name)\n",
    "    if not container_client.exists():\n",
    "        container_client.create_container()\n",
    "\n",
    "    documents_directory = \"csv_data\"\n",
    "    csv_files = glob.glob(os.path.join(documents_directory, '*.csv'))\n",
    "    for file in csv_files:\n",
    "        with open(file, \"rb\") as data:\n",
    "            name = os.path.basename(file)\n",
    "            if not container_client.get_blob_client(name).exists():\n",
    "                container_client.upload_blob(name=name, data=data)\n",
    "\n",
    "upload_sample_documents(\n",
    "    blob_connection_string=blob_connection_string,\n",
    "    blob_container_name=blob_container_name,\n",
    "    # Set to false if you want to use credentials included in the blob connection string\n",
    "    # Otherwise your identity will be used as credentials\n",
    "    use_user_identity=True\n",
    ")\n",
    "print(f\"Setup sample data in {blob_container_name}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a blob data source connector on Azure AI Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data source 'my-demo-index-blob' created or updated\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.indexes import SearchIndexerClient\n",
    "from azure.search.documents.indexes.models import (\n",
    "    SearchIndexerDataContainer,\n",
    "    SearchIndexerDataSourceConnection,\n",
    "    SoftDeleteColumnDeletionDetectionPolicy\n",
    ")\n",
    "\n",
    "# Create a data source\n",
    "# NOTE: To remove records from a search index, add a column to the row \"IsDeleted\" set to \"True\". The next indexer run will remove this record\n",
    "# To learn more please visit https://learn.microsoft.com/en-us/azure/search/search-howto-index-one-to-many-blobs\n",
    "indexer_client = SearchIndexerClient(endpoint, credential)\n",
    "container = SearchIndexerDataContainer(name=blob_container_name)\n",
    "data_source_connection = SearchIndexerDataSourceConnection(\n",
    "    name=f\"{index_name}-blob\",\n",
    "    type=\"azureblob\",\n",
    "    connection_string=search_blob_connection_string,\n",
    "    container=container,\n",
    "    data_deletion_detection_policy=SoftDeleteColumnDeletionDetectionPolicy(soft_delete_column_name=\"IsDeleted\", soft_delete_marker_value=\"True\")\n",
    ")\n",
    "data_source = indexer_client.create_or_update_data_source_connection(data_source_connection)\n",
    "\n",
    "print(f\"Data source '{data_source.name}' created or updated\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a search index\n",
    "\n",
    "Vector and nonvector content is stored in a search index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "my-demo-index created\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.indexes import SearchIndexClient\n",
    "from azure.search.documents.indexes.models import (\n",
    "    SearchField,\n",
    "    SearchFieldDataType,\n",
    "    VectorSearch,\n",
    "    HnswAlgorithmConfiguration,\n",
    "    VectorSearchProfile,\n",
    "    AzureOpenAIVectorizer,\n",
    "    AzureOpenAIVectorizerParameters,\n",
    "    SemanticConfiguration,\n",
    "    SemanticSearch,\n",
    "    SemanticPrioritizedFields,\n",
    "    SemanticField,\n",
    "    SearchIndex\n",
    ")\n",
    "\n",
    "# Create a search index\n",
    "# NOTE: You must adjust these fields based on your CSV Schema.\n",
    "# There is no chunking of the description or title fields in this sample.\n",
    "# There is a separate AzureSearch_DocumentKey for the key automatically generated by the indexer\n",
    "# Learn more at https://learn.microsoft.com/en-us/azure/search/search-howto-index-csv-blobs\n",
    "index_client = SearchIndexClient(endpoint=endpoint, credential=credential)  \n",
    "fields = [  \n",
    "    SearchField(name=\"AzureSearch_DocumentKey\",  key=True, type=SearchFieldDataType.String),\n",
    "    SearchField(name=\"ID\", type=SearchFieldDataType.String, sortable=True, filterable=True, facetable=False),  \n",
    "    SearchField(name=\"Name\", type=SearchFieldDataType.String, filterable=True),  \n",
    "    SearchField(name=\"Age\", type=SearchFieldDataType.Int32, sortable=True, filterable=True, facetable=False),  \n",
    "    SearchField(name=\"Title\", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),\n",
    "    SearchField(name=\"Description\", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),\n",
    "    SearchField(name=\"TitleVector\", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=azure_openai_model_dimensions, vector_search_profile_name=\"myHnswProfile\"),\n",
    "    SearchField(name=\"DescriptionVector\", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=azure_openai_model_dimensions, vector_search_profile_name=\"myHnswProfile\"),\n",
    "]  \n",
    "  \n",
    "# Configure the vector search configuration  \n",
    "vector_search = VectorSearch(  \n",
    "    algorithms=[  \n",
    "        HnswAlgorithmConfiguration(name=\"myHnsw\"),\n",
    "    ],  \n",
    "    profiles=[  \n",
    "        VectorSearchProfile(  \n",
    "            name=\"myHnswProfile\",  \n",
    "            algorithm_configuration_name=\"myHnsw\",\n",
    "            vectorizer_name=\"myOpenAI\",\n",
    "        )\n",
    "    ],  \n",
    "    vectorizers=[  \n",
    "        AzureOpenAIVectorizer(  \n",
    "            vectorizer_name=\"myOpenAI\",  \n",
    "            parameters=AzureOpenAIVectorizerParameters(  \n",
    "                resource_url=azure_openai_endpoint,  \n",
    "                deployment_name=azure_openai_embedding_deployment,\n",
    "                model_name=azure_openai_model_name,\n",
    "                api_key=azure_openai_key,\n",
    "            ),\n",
    "        ),  \n",
    "    ],  \n",
    ")  \n",
    "  \n",
    "semantic_config = SemanticConfiguration(  \n",
    "    name=\"my-semantic-config\",  \n",
    "    prioritized_fields=SemanticPrioritizedFields(\n",
    "        title_field=SemanticField(field_name=\"Title\"),\n",
    "        content_fields=[SemanticField(field_name=\"Description\")]  \n",
    "    ),  \n",
    ")\n",
    "\n",
    "# Create the semantic search with the configuration  \n",
    "semantic_search = SemanticSearch(configurations=[semantic_config])  \n",
    "  \n",
    "# Create the search index\n",
    "index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search)  \n",
    "result = index_client.create_or_update_index(index)  \n",
    "print(f\"{result.name} created\")  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a skillset\n",
    "\n",
    "Skills drive integrated vectorization. [AzureOpenAIEmbedding](https://learn.microsoft.com/azure/search/cognitive-search-skill-azure-openai-embedding) handles calls to Azure OpenAI, using the connection information you provide in the environment variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "my-demo-index-skillset created\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.indexes.models import (\n",
    "    InputFieldMappingEntry,\n",
    "    OutputFieldMappingEntry,\n",
    "    AzureOpenAIEmbeddingSkill,\n",
    "    SearchIndexerSkillset\n",
    ")\n",
    "\n",
    "# Create a skillset  \n",
    "skillset_name = f\"{index_name}-skillset\"\n",
    "  \n",
    "title_embedding_skill = AzureOpenAIEmbeddingSkill(  \n",
    "    description=\"Skill to generate title embeddings via Azure OpenAI\",  \n",
    "    context=\"/document\",  \n",
    "    resource_url=azure_openai_endpoint,  \n",
    "    deployment_name=azure_openai_embedding_deployment,  \n",
    "    model_name=azure_openai_model_name,\n",
    "    dimensions=azure_openai_model_dimensions,\n",
    "    api_key=azure_openai_key,  \n",
    "    inputs=[  \n",
    "        InputFieldMappingEntry(name=\"text\", source=\"/document/Title\"),  \n",
    "    ],  \n",
    "    outputs=[  \n",
    "        OutputFieldMappingEntry(name=\"embedding\", target_name=\"TitleVector\")  \n",
    "    ],  \n",
    ")\n",
    "\n",
    "description_embedding_skill = AzureOpenAIEmbeddingSkill(  \n",
    "    description=\"Skill to generate description embeddings via Azure OpenAI\",  \n",
    "    context=\"/document\",  \n",
    "    resource_url=azure_openai_endpoint,  \n",
    "    deployment_name=azure_openai_embedding_deployment,  \n",
    "    model_name=azure_openai_model_name,\n",
    "    dimensions=azure_openai_model_dimensions,\n",
    "    api_key=azure_openai_key,  \n",
    "    inputs=[  \n",
    "        InputFieldMappingEntry(name=\"text\", source=\"/document/Description\"),  \n",
    "    ],  \n",
    "    outputs=[  \n",
    "        OutputFieldMappingEntry(name=\"embedding\", target_name=\"DescriptionVector\")  \n",
    "    ],  \n",
    ")  \n",
    "\n",
    "skills = [title_embedding_skill, description_embedding_skill]\n",
    "\n",
    "skillset = SearchIndexerSkillset(  \n",
    "    name=skillset_name,  \n",
    "    description=\"Skillset to generate embeddings\",  \n",
    "    skills=skills\n",
    ")\n",
    "  \n",
    "client = SearchIndexerClient(endpoint, credential)  \n",
    "client.create_or_update_skillset(skillset)  \n",
    "print(f\"{skillset.name} created\")  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create an indexer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "my-demo-index-indexer is created and running. If queries return no results, please wait a bit and try again.\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents.indexes.models import (\n",
    "    SearchIndexer,\n",
    "    FieldMapping,\n",
    "    FieldMappingFunction,\n",
    "    IndexingParameters,\n",
    "    IndexingParametersConfiguration,\n",
    "    BlobIndexerParsingMode\n",
    ")\n",
    "\n",
    "# Create an indexer  \n",
    "indexer_name = f\"{index_name}-indexer\"  \n",
    "indexer_parameters = IndexingParameters(\n",
    "        configuration=IndexingParametersConfiguration(\n",
    "            parsing_mode=BlobIndexerParsingMode.DELIMITED_TEXT,\n",
    "            query_timeout=None,\n",
    "            first_line_contains_headers=True))\n",
    "\n",
    "indexer = SearchIndexer(  \n",
    "    name=indexer_name,  \n",
    "    description=\"Indexer to index documents and generate embeddings\",  \n",
    "    skillset_name=skillset_name,  \n",
    "    target_index_name=index_name,  \n",
    "    data_source_name=data_source.name,\n",
    "    parameters=indexer_parameters,\n",
    "    field_mappings=[FieldMapping(source_field_name=\"AzureSearch_DocumentKey\", target_field_name=\"AzureSearch_DocumentKey\", mapping_function=FieldMappingFunction(name=\"base64Encode\"))],\n",
    "    output_field_mappings=[\n",
    "        FieldMapping(source_field_name=\"/document/TitleVector\", target_field_name=\"TitleVector\"),\n",
    "        FieldMapping(source_field_name=\"/document/DescriptionVector\", target_field_name=\"DescriptionVector\")\n",
    "    ]\n",
    ")  \n",
    "\n",
    "indexer_client = SearchIndexerClient(endpoint, credential)  \n",
    "indexer_result = indexer_client.create_or_update_indexer(indexer)  \n",
    "  \n",
    "# Run the indexer  \n",
    "indexer_client.run_indexer(indexer_name)  \n",
    "print(f'{indexer_name} is created and running. If queries return no results, please wait a bit and try again.')  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform a hybrid search\n",
    "\n",
    "This example shows a hybrid vector search using the vectorizable text query, all you need to do is pass in text and your vectorizer will handle the query vectorization.\n",
    "Ask a zoo employment related question that can be answered just using the title and description fields"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Score: 0.03333333507180214\n",
      "ID: 7\n",
      "Name: Mary Wilson\n",
      "Title: Aquarist\n",
      "Description: Maintains aquatic exhibits\n",
      "Score: 0.032522473484277725\n",
      "ID: 10\n",
      "Name: James Anderson\n",
      "Title: Groundskeeper\n",
      "Description: Maintains zoo grounds\n",
      "Score: 0.03201844170689583\n",
      "ID: 16\n",
      "Name: Mason Thompson\n",
      "Title: Maintenance Worker\n",
      "Description: Handles maintenance and repairs\n"
     ]
    }
   ],
   "source": [
    "from azure.search.documents import SearchClient\n",
    "from azure.search.documents.models import VectorizableTextQuery\n",
    "\n",
    "# Pure Vector Search\n",
    "query = \"Cleans fish tanks\"\n",
    "  \n",
    "search_client = SearchClient(endpoint, index_name, credential=credential)\n",
    "vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields=\"TitleVector,DescriptionVector\")\n",
    "# Use the below query to pass in the raw vector query instead of the query vectorization\n",
    "# vector_query = RawVectorQuery(vector=generate_embeddings(query), k_nearest_neighbors=50, fields=\"vector\")\n",
    "  \n",
    "results = search_client.search(  \n",
    "    search_text=query,  \n",
    "    vector_queries= [vector_query],\n",
    "    select=[\"ID\", \"Name\", \"Title\", \"Description\"],\n",
    "    top=3\n",
    ")  \n",
    "  \n",
    "for result in results:\n",
    "    print(f\"Score: {result['@search.score']}\")  \n",
    "    print(f\"ID: {result['ID']}\")  \n",
    "    print(f\"Name: {result['Name']}\")  \n",
    "    print(f\"Title: {result['Title']}\")\n",
    "    print(f\"Description: {result['Description']}\")   \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Answer questions that require data analysis\n",
    "\n",
    "Some questions require a deeper understanding of the data schema. For example, the question \"Which employees are older than 40?\" requires using [filtering](https://learn.microsoft.com/en-us/azure/search/search-filters) and \"Who is the youngest employee\" requires using [sorting](https://learn.microsoft.com/en-us/azure/search/search-pagination-page-layout). Use your [chat deployment](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/completions) to create the correct Azure Search query to answer the question"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from azure.search.documents import SearchClient\n",
    "from azure.search.documents.models import VectorizableTextQuery\n",
    "from openai import AzureOpenAI\n",
    "from azure.identity import DefaultAzureCredential, get_bearer_token_provider\n",
    "from pydantic import BaseModel, Field\n",
    "import pandas as pd\n",
    "import json\n",
    "from typing import Optional\n",
    "\n",
    "openai_credential = DefaultAzureCredential()\n",
    "token_provider = get_bearer_token_provider(openai_credential, \"https://cognitiveservices.azure.com/.default\")\n",
    "\n",
    "client = AzureOpenAI(\n",
    "    api_version=azure_openai_api_version,\n",
    "    azure_endpoint=azure_openai_endpoint,\n",
    "    api_key=azure_openai_key,\n",
    "    azure_ad_token_provider=token_provider if not azure_openai_key else None\n",
    ")\n",
    "\n",
    "# See https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/structured-outputs for more information\n",
    "# NOTE: Updating the tool definition with specific examples related to your data will help improve the accuracy.\n",
    "class QueryOptions(BaseModel):\n",
    "    \"\"\"\n",
    "    Given a question, get any additional Azure Search query parameters required to answer the question. If no additional query parameters are required to answer the question, don't return any.\n",
    "    \"\"\"\n",
    "    orderBy: Optional[str] = Field(description=\"Specify a custom sort order for search results. Format is a comma-separated list of up to 32 order-by clauses. If a direction is not specified, the default is ascending. Example: ID, Age desc, Title asc\")\n",
    "    filter: Optional[str] = Field(description=\"Specify inclusion or exclusion criteria for search results. Format is an Azure Search OData boolean expression. Example: Age le 4 or not (Age gt 8). Do not use filters for text expressions, only numeric ones\")\n",
    "    search: Optional[str] = Field(description=\"Specify a query string used to search text and vectors in an Azure Search index in order to answer the provided question. If no query string is required to answer the question, return * or no query string at all\")\n",
    "\n",
    "# Specifically instruct the model to only use filterable fields when creating query options\n",
    "filterable_fields = \", \".join([field.name for field in fields if field.filterable])\n",
    "query_options_system_prompt = f\"Create options for Azure Search queries. If you are creating filters, you may only use the following fields: {filterable_fields}. Only generate filters when you are trying to answer a question involving numbers.\"\n",
    "def get_query_options(query: str) -> QueryOptions:\n",
    "    response = client.beta.chat.completions.parse(\n",
    "        model=azure_openai_chat_deployment,\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": query_options_system_prompt},\n",
    "            {\"role\": \"user\", \"content\": query}\n",
    "        ],\n",
    "        response_format=QueryOptions\n",
    "    )\n",
    "    return response.choices[0].message.parsed\n",
    "\n",
    "\n",
    "answer_query_results_system_prompt = f\"The following question requires search results to provide an answer. Use the provided search results to answer the question. If you can't answer the question using the search results, say I don't know.\"\n",
    "search_client = SearchClient(endpoint, index_name, credential=credential)\n",
    "def answer_query(query: str) -> str:\n",
    "    # Parse the query options returned by the model\n",
    "    query_options = get_query_options(query)\n",
    "    query_option_search = query_options.search\n",
    "    vector_queries = None\n",
    "    if query_option_search and query_option_search != \"*\":\n",
    "        vector_queries = [VectorizableTextQuery(text=query_option_search, k_nearest_neighbors=50, fields=\"TitleVector,DescriptionVector\")]\n",
    "\n",
    "    query_option_order_by = query_options.orderBy\n",
    "    order_by = None\n",
    "    query_type = None\n",
    "    semantic_configuration_name = None\n",
    "    if query_option_order_by:\n",
    "        try:\n",
    "            order_by = query_option_order_by.split(\",\")\n",
    "        except:\n",
    "            order_by = None\n",
    "\n",
    "    # This sample only uses specific fields to answer questions. Update these fields for your own data\n",
    "    columns = [\"ID\", \"Age\", \"Name\", \"Title\", \"Description\"]\n",
    "    search_results = search_client.search(\n",
    "        search_text=query_option_search,\n",
    "        vector_queries=vector_queries,\n",
    "        top=5,\n",
    "        order_by=order_by,\n",
    "        query_type=query_type,\n",
    "        semantic_configuration_name=semantic_configuration_name,\n",
    "        filter=query_options.filter,\n",
    "        select=columns\n",
    "    )\n",
    "\n",
    "    # Convert the search results to markdown for use by the model\n",
    "    results = [ { column: result[column] for column in columns } for result in search_results ]\n",
    "    results_markdown_table = pd.DataFrame(results).to_markdown(index=False)\n",
    "\n",
    "    response = client.chat.completions.create(\n",
    "        model=azure_openai_chat_deployment,\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": answer_query_results_system_prompt},\n",
    "            {\"role\": \"user\", \"content\": results_markdown_table },\n",
    "            {\"role\": \"user\", \"content\": query}\n",
    "        ]\n",
    "    )\n",
    "    # Return the generated answer, query options, and results table for analysis\n",
    "    return response.choices[0].message.content, query_options, results_markdown_table\n",
    "\n",
    "def print_answer(answer, query_options, results):\n",
    "    print(\"Generated Answer:\", answer)\n",
    "    print(\"Generated Query Options:\", query_options)\n",
    "    print(\"Search Results\")\n",
    "    print(results)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Answer sample questions\n",
    "\n",
    "These questions may require filtering and sorting in addition to regular search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "answer, query_options, results = answer_query(\"Who is the youngest employee?\")\n",
    "print_answer(answer, query_options, results)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generated Answer: The person who provides marketing updates on the zoo is William Harris, the Marketing Coordinator, who promotes zoo events and activities.\n",
      "Generated Query Options: orderBy=None filter=None search='marketing updates zoo'\n",
      "Search Results\n",
      "|   ID |   Age | Name           | Title                 | Description                          |\n",
      "|-----:|------:|:---------------|:----------------------|:-------------------------------------|\n",
      "|   14 |    32 | William Harris | Marketing Coordinator | Promotes zoo events and activities   |\n",
      "|   11 |    26 | Olivia Thomas  | Facilities Manager    | Manages zoo facilities               |\n",
      "|    4 |    23 | Robert Brown   | Tour Guide            | Guides visitors through the zoo      |\n",
      "|   10 |    43 | James Anderson | Groundskeeper         | Maintains zoo grounds                |\n",
      "|    1 |    64 | John Doe       | Zookeeper             | Cares for animals and their habitats |\n"
     ]
    }
   ],
   "source": [
    "answer, query_options, results = answer_query(\"Who provides marketing updates on the zoo?\")\n",
    "print_answer(answer, query_options, results)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generated Answer: David Moore is the youngest employee older than 40, with an age of 41.\n",
      "Generated Query Options: orderBy='Age asc' filter='Age gt 40' search='*'\n",
      "Search Results\n",
      "|   ID |   Age | Name               | Title                   | Description                              |\n",
      "|-----:|------:|:-------------------|:------------------------|:-----------------------------------------|\n",
      "|    8 |    41 | David Moore        | Curator                 | Oversees animal exhibits and collections |\n",
      "|   18 |    43 | Ethan Martinez     | Fundraising Coordinator | Organizes fundraising events             |\n",
      "|   10 |    43 | James Anderson     | Groundskeeper           | Maintains zoo grounds                    |\n",
      "|   12 |    44 | Daniel Jackson     | Guest Services          | Assists visitors and handles inquiries   |\n",
      "|   19 |    44 | Charlotte Robinson | Event Planner           | Plans and coordinates events             |\n"
     ]
    }
   ],
   "source": [
    "answer, query_options, results = answer_query(\"Of the employees who are older than 40, who is the youngest?\")\n",
    "print_answer(answer, query_options, results)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generated Answer: The employee with the first name Alice is Alice Johnson.\n",
      "Generated Query Options: orderBy=None filter=None search='Alice'\n",
      "Search Results\n",
      "|   ID |   Age | Name               | Title                    | Description                          |\n",
      "|-----:|------:|:-------------------|:-------------------------|:-------------------------------------|\n",
      "|    3 |    23 | Alice Johnson      | Animal Trainer           | Trains animals for performances      |\n",
      "|    4 |    23 | Robert Brown       | Tour Guide               | Guides visitors through the zoo      |\n",
      "|   19 |    44 | Charlotte Robinson | Event Planner            | Plans and coordinates events         |\n",
      "|    1 |    64 | John Doe           | Zookeeper                | Cares for animals and their habitats |\n",
      "|   17 |    59 | Mia Garcia         | Administrative Assistant | Supports administrative tasks        |\n"
     ]
    }
   ],
   "source": [
    "answer, query_options, results = answer_query(\"Who are the employees who's first name is Alice?\")\n",
    "print_answer(answer, query_options, results)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generated Answer: I don't know. The provided data does not include an employee named Scarlett.\n",
      "Generated Query Options: orderBy=None filter=None search='Scarlett'\n",
      "Search Results\n",
      "|   ID |   Age | Name               | Title                 | Description                          |\n",
      "|-----:|------:|:-------------------|:----------------------|:-------------------------------------|\n",
      "|   14 |    32 | William Harris     | Marketing Coordinator | Promotes zoo events and activities   |\n",
      "|    3 |    23 | Alice Johnson      | Animal Trainer        | Trains animals for performances      |\n",
      "|    1 |    64 | John Doe           | Zookeeper             | Cares for animals and their habitats |\n",
      "|   19 |    44 | Charlotte Robinson | Event Planner         | Plans and coordinates events         |\n",
      "|   15 |    21 | Isabella Martin    | Researcher            | Conducts research on wildlife        |\n"
     ]
    }
   ],
   "source": [
    "answer, query_options, results = answer_query(\"Is there an employee named Scarlett?\")\n",
    "print_answer(answer, query_options, results)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
